Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images
نویسندگان
چکیده
In this paper, a novel approach for word extraction and character segmentation from the handwritten Bangla document images is reported. At first, a modified Run Length Smoothing Algorithm (RLSA), called Spiral Run Length Smearing Algorithm (SRLSA), is applied for the extraction of words from the text lines of unconstrained handwritten Bangla document images. This technique has helped to overcome some of the drawbacks of standard horizontal and vertical RLSA techniques. SRLSA technique has been applied on the Bangla handwritten document image database CMATERdb1.1.1 and the success rate of the word extraction is found to be 86.01%. In the second part of the work, we have presented a useful solution to the problem on how best word images of handwritten Bangla script can be segmented into constituent characters. Moreover, the technique can segment the words having discontinuity in Matra, a prominent feature of Bangla script. It also optimizes the trade-off between under/over segmentation as Matra region and segmentation points are estimated more precisely. As a result, better word segmentation accuracy is achieved with minimal data loss. Here, a success rate of 92.48% is observed on a dataset of 750 handwritten Bangla words which is 3.35% higher than that of our earlier techniques.
منابع مشابه
An improved offline handwritten character segmentation algorithm for Bangla script
Effective segmentation of offline handwritten word images of unconstrained handwritten Bangla script is a challenging problem in Optical Character Recognition (OCR) application. Presence of a continuous horizontal line called ‘Matra’ is an important feature of this script. However, in unconstrained cursive handwriting, Matra can be wavy or discontinuous, makes the problem of segmentation diffic...
متن کاملA Script Independent Technique for Extraction of Characters from Handwritten Word Images
A script independent character segmentation from word images technique has been reported here. Word to character segmentation is an important preprocessing step of optical character recognition process. But in case of handwritten text, presence of touching characters decreases the accuracy of the technique of the segmentation of the characters from the word. In this paper, segmentation of handw...
متن کاملSegmentation of Bangla Unconstrained Handwritten Text
To take care of variability involved in the writing style of different individuals in this paper we propose a robust scheme to segment unconstrained handwritten Bangla texts into lines, words and characters. For line segmentation, at first, we divide the text into vertical stripes. Stripe width of a document is computed by statistical analysis of the text height in the document. Next we determi...
متن کاملPerformance of Statistics Based Line Segmentation System for Unconstrained Handwritten Text
Handwritten character recognition is a technique by which a computer system could recognize characters and other symbols written in natural handwriting. Segmentation decomposes the document image into subcomponents like lines, words and characters. To achieve greater accuracy, segmentation and recognition could not be treated independently. Most of the existing line segmentation methods have li...
متن کاملZone-based Keyword Spotting in Bangla and Devanagari Documents
In this paper we present a word spotting system in text lines for offline Indic scripts such as Bangla (Bengali) and Devanagari. Recently, it was shown that zone-wise recognition method improves the word recognition performance than conventional full word recognition system in Indic scripts [29]. Inspired with this idea we consider the zone segmentation approach and use middle zone information ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Intelligent Systems
دوره 20 شماره
صفحات -
تاریخ انتشار 2011